A Compress-Based Association Mining Algorithm for Large Dataset

نویسندگان

  • Mafruz Zaman Ashrafi
  • David Taniar
  • Kate Smith-Miles
چکیده

The association mining is one of the primary sub-areas in the field of data mining. This technique had been used in numerous practical applications, including consumer market basket analysis, inferring patterns from web page access logs, network intrusion detection and pattern discovery in biological applications. Most of the traditional association-mining algorithms assume that whole dataset can be loaded in the main memory. Hence, problem arise when such algorithms is applied in large dataset. In this paper we present a new algorithm for association mining. Our algorithm is efficient when the size of dataset is huge that cannot be load in the main memory. The proposed algorithm also reduces the frequent itemsets search space, by eliminating non-frequent 1itemsets after the first pass. Our performance evaluation shows algorithm outperforms Apriori algorithm in different datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)

In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

RealKrimp - Finding Hyperintervals that Compress with MDL for Real-Valued Data

The MDL Principle (induction by compression) is applied with meticulous effort in the Krimp algorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining, Krimp is not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by Krimp’s success at using ...

متن کامل

Particle swarm Optimization Based Association Rule Mining

Association rule mining is one of the widely using and simple concepts to find the frequent item sets from large number of datasets. While generating frequent item sets from a large dataset using association rule mining is not so efficient. This can be improved by using particle swarm optimization algorithm (PSO). PSO algorithm is population based evolutionary heuristic search methods used for ...

متن کامل

SAS: Implementation of scaled association rules on spatial multidimensional quantitative dataset

Mining spatial association rules is one of the most important branches in the field of Spatial Data Mining (SDM). Because of the complexity of spatial data, a traditional method in extracting spatial association rules is to transform spatial database into general transaction database. The Apriori algorithm is one of the most commonly used methods in mining association rules at present. But a sh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003